Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing INLINE's on EnumContainers #5499

Merged
merged 2 commits into from
Dec 12, 2024
Merged

Add missing INLINE's on EnumContainers #5499

merged 2 commits into from
Dec 12, 2024

Conversation

ChrisPenner
Copy link
Contributor

Overview

I was partaking in my now daily ritual ritual of staring into the void (a.k.a. GHC Core) guess what I found!

The difference here is substantial and implies that the places we're still using Enum Containers probably require another look, or at least I should re-examine the core again to explain how there's this much of a speed difference.

It's really important to be careful about INLINE pragmas when helper methods are in a different module than they're used in, since GHC won't typically inline across modules unless asked to.

Implementation notes

Just add a bunch of INLINE annotations to EnumMap methods

Benchmarks:

trunk -> new

fib1
313.5µs -> 206.985µs

fib2
2.241831ms -> 1.820296ms

fib3
2.651751ms -> 2.060708ms

Decode Nat
341ns -> 245ns

Generate 100 random numbers
207.506µs -> 146.037µs

List.foldLeft
2.020392ms -> 1.538192ms

Count to 1 million
124.85875ms -> 80.50275ms

Json parsing (per document)
258.366µs -> 214.227µs

Count to N (per element)
190ns -> 119ns

Count to 1000
191.382µs -> 119.902µs

Mutate a Ref 1000 times
316.96µs -> 204.624µs

CAS an IO.ref 1000 times
426.228µs -> 287.614µs

List.range (per element)
326ns -> 244ns

List.range 0 1000
345.768µs -> 255.853µs

Set.fromList (range 0 1000)
1.584278ms -> 1.194793ms

Map.fromList (range 0 1000)
1.160257ms -> 845.532µs

NatMap.fromList (range 0 1000)
4.869621ms -> 3.453725ms

Map.lookup (1k element map)
2.539µs -> 1.709µs

Map.insert (1k element map)
6.829µs -> 4.899µs

List.at (1k element list)
286ns -> 188ns

Text.split /
35.598µs -> 26.241µs

@ChrisPenner ChrisPenner marked this pull request as ready for review December 11, 2024 04:32
@ChrisPenner ChrisPenner requested a review from dolio December 11, 2024 17:46
@ChrisPenner
Copy link
Contributor Author

From staring at the core I was able to determine that the difference is actually because CCache is not getting unboxed on this branch.

Something about inlining these convinced GHC NOT to unbox the CCache, and it improves the speed by not passing the extra dozen args to the eval worker.

Dan and I confirmed we get a similar speedup by just removing the !s on CCache in eval and exec.

That said, this PR should probably go in anyways since there's no reason to NOT inline these, and clearly it's helping GHC be at least a little smarter :)

The interpreter is much faster when it doesn't get unboxed.
@ChrisPenner ChrisPenner merged commit f9cc70e into trunk Dec 12, 2024
32 checks passed
@ChrisPenner ChrisPenner deleted the cp/inline-ec branch December 12, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants